44 research outputs found
Sampling Random Spanning Trees Faster than Matrix Multiplication
We present an algorithm that, with high probability, generates a random
spanning tree from an edge-weighted undirected graph in
time (The notation hides
factors). The tree is sampled from a distribution
where the probability of each tree is proportional to the product of its edge
weights. This improves upon the previous best algorithm due to Colbourn et al.
that runs in matrix multiplication time, . For the special case of
unweighted graphs, this improves upon the best previously known running time of
for (Colbourn
et al. '96, Kelner-Madry '09, Madry et al. '15).
The effective resistance metric is essential to our algorithm, as in the work
of Madry et al., but we eschew determinant-based and random walk-based
techniques used by previous algorithms. Instead, our algorithm is based on
Gaussian elimination, and the fact that effective resistance is preserved in
the graph resulting from eliminating a subset of vertices (called a Schur
complement). As part of our algorithm, we show how to compute
-approximate effective resistances for a set of vertex pairs via
approximate Schur complements in time,
without using the Johnson-Lindenstrauss lemma which requires time. We
combine this approximation procedure with an error correction procedure for
handing edges where our estimate isn't sufficiently accurate
Solving Directed Laplacian Systems in Nearly-Linear Time through Sparse LU Factorizations
We show how to solve directed Laplacian systems in nearly-linear time. Given
a linear system in an Eulerian directed Laplacian with nonzero
entries, we show how to compute an -approximate solution in time . Through reductions from [Cohen et al.
FOCS'16] , this gives the first nearly-linear time algorithms for computing
-approximate solutions to row or column diagonally dominant linear
systems (including arbitrary directed Laplacians) and computing
-approximations to various properties of random walks on directed
graphs, including stationary distributions, personalized PageRank vectors,
hitting times, and escape probabilities. These bounds improve upon the recent
almost-linear algorithms of [Cohen et al. STOC'17], which gave an algorithm to
solve Eulerian Laplacian systems in time .
To achieve our results, we provide a structural result that we believe is of
independent interest. We show that Laplacians of all strongly connected
directed graphs have sparse approximate LU-factorizations. That is, for every
such directed Laplacian , there is a lower triangular matrix
and an upper triangular matrix
, each with at most
nonzero entries, such that their product spectrally approximates
in an appropriate norm. This claim can be viewed as an analogue of recent work
on sparse Cholesky factorizations of Laplacians of undirected graphs. We show
how to construct such factorizations in nearly-linear time and prove that, once
constructed, they yield nearly-linear time algorithms for solving directed
Laplacian systems.Comment: Appeared in FOCS 201
Optimal Sketching Bounds for Sparse Linear Regression
We study oblivious sketching for -sparse linear regression under various
loss functions such as an norm, or from a broad class of hinge-like
loss functions, which includes the logistic and ReLU losses. We show that for
sparse norm regression, there is a distribution over oblivious
sketches with rows, which is tight up to a
constant factor. This extends to loss with an additional additive
term in the upper bound. This
establishes a surprising separation from the related sparse recovery problem,
which is an important special case of sparse regression. For this problem,
under the norm, we observe an upper bound of rows, showing that sparse recovery is
strictly easier to sketch than sparse regression. For sparse regression under
hinge-like loss functions including sparse logistic and sparse ReLU regression,
we give the first known sketching bounds that achieve rows showing that
rows suffice, where
is a natural complexity parameter needed to obtain relative error bounds for
these loss functions. We again show that this dimension is tight, up to lower
order terms and the dependence on . Finally, we show that similar
sketching bounds can be achieved for LASSO regression, a popular convex
relaxation of sparse regression, where one aims to minimize
over . We show that sketching
dimension suffices and that the dependence
on and is tight.Comment: AISTATS 202
Efficient Second-Order Shape-Constrained Function Fitting
We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted- norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error in time, where captures the range of input values. We also give a simple greedy algorithm that runs in time for the special case of unweighted convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming
The Cosmos of a Public Sector Township: Democracy as an Intellectual Culture
The public sector plays an important role in responding to the rights of citizens and evolving norms of social interest (Qu 2015). Qu argues that the nature of public enterprise is never final and there is a constant negotiation between the private and the public emergence of life and rights. One such space where the tension between the private and the public manifests itself is the public sector township or the residential colony in India. The sociality of hierarchy in public sector organizations manifest itself in the public sector township and may nurture everyday aspirations, angsts and divides. The officer lives in a bigger hone, in a bungalow, and the clerk lives in a smaller home, many times with a larger family. [excerpt
Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans
Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have
fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in
25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16
regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of
correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP,
while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in
Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium
(LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region.
Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant
enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the
refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa,
an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of
PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent
signals within the same regio
Repositioning of the global epicentre of non-optimal cholesterol
High blood cholesterol is typically considered a feature of wealthy western countries(1,2). However, dietary and behavioural determinants of blood cholesterol are changing rapidly throughout the world(3) and countries are using lipid-lowering medications at varying rates. These changes can have distinct effects on the levels of high-density lipoprotein (HDL) cholesterol and non-HDL cholesterol, which have different effects on human health(4,5). However, the trends of HDL and non-HDL cholesterol levels over time have not been previously reported in a global analysis. Here we pooled 1,127 population-based studies that measured blood lipids in 102.6 million individuals aged 18 years and older to estimate trends from 1980 to 2018 in mean total, non-HDL and HDL cholesterol levels for 200 countries. Globally, there was little change in total or non-HDL cholesterol from 1980 to 2018. This was a net effect of increases in low- and middle-income countries, especially in east and southeast Asia, and decreases in high-income western countries, especially those in northwestern Europe, and in central and eastern Europe. As a result, countries with the highest level of non-HDL cholesterol-which is a marker of cardiovascular riskchanged from those in western Europe such as Belgium, Finland, Greenland, Iceland, Norway, Sweden, Switzerland and Malta in 1980 to those in Asia and the Pacific, such as Tokelau, Malaysia, The Philippines and Thailand. In 2017, high non-HDL cholesterol was responsible for an estimated 3.9 million (95% credible interval 3.7 million-4.2 million) worldwide deaths, half of which occurred in east, southeast and south Asia. The global repositioning of lipid-related risk, with non-optimal cholesterol shifting from a distinct feature of high-income countries in northwestern Europe, north America and Australasia to one that affects countries in east and southeast Asia and Oceania should motivate the use of population-based policies and personal interventions to improve nutrition and enhance access to treatment throughout the world.Peer reviewe
Global variations in diabetes mellitus based on fasting glucose and haemogloblin A1c
Fasting plasma glucose (FPG) and haemoglobin A1c (HbA1c) are both used to diagnose
diabetes, but may identify different people as having diabetes. We used data from 117
population-based studies and quantified, in different world regions, the prevalence of
diagnosed diabetes, and whether those who were previously undiagnosed and detected
as having diabetes in survey screening had elevated FPG, HbA1c, or both. We developed
prediction equations for estimating the probability that a person without previously
diagnosed diabetes, and at a specific level of FPG, had elevated HbA1c, and vice versa.
The age-standardised proportion of diabetes that was previously undiagnosed, and
detected in survey screening, ranged from 30% in the high-income western region to 66%
in south Asia. Among those with screen-detected diabetes with either test, the agestandardised
proportion who had elevated levels of both FPG and HbA1c was 29-39%
across regions; the remainder had discordant elevation of FPG or HbA1c. In most low- and
middle-income regions, isolated elevated HbA1c more common than isolated elevated
FPG. In these regions, the use of FPG alone may delay diabetes diagnosis and
underestimate diabetes prevalence. Our prediction equations help allocate finite
resources for measuring HbA1c to reduce the global gap in diabetes diagnosis and
surveillance.peer-reviewe